Structure Analysis and Generation for Internet Documents
Identifieur interne : 000C11 ( Main/Exploration ); précédent : 000C10; suivant : 000C12Structure Analysis and Generation for Internet Documents
Auteurs : Kyong Ho Lee [États-Unis] ; Yoon Chul Choy [Corée du Sud] ; Sung-Bae Cho [Corée du Sud]Source :
- Studies in Fuzziness and Soft Computing [ 1434-9922 ]
Abstract
Abstract: This paper presents a syntactic method for logical structure analysis and generation for creation of Web documents. The method transforms document images with multiple pages and hierarchical structure into an XML document. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of document class efficiently. Experimental results with 372 images scanned from the technical journal show that the method has performed logical structure analysis successfully. Particularly, the method generates XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.
Url:
DOI: 10.1007/978-3-7908-1772-0_1
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001959
- to stream Istex, to step Curation: 001428
- to stream Istex, to step Checkpoint: 000B26
- to stream Main, to step Merge: 000C28
- to stream Main, to step Curation: 000C11
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Structure Analysis and Generation for Internet Documents</title>
<author><name sortKey="Lee, Kyong Ho" sort="Lee, Kyong Ho" uniqKey="Lee K" first="Kyong Ho" last="Lee">Kyong Ho Lee</name>
</author>
<author><name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon Chul" last="Choy">Yoon Chul Choy</name>
</author>
<author><name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:615F8C09F525049E216106C0A028B51B97E3B775</idno>
<date when="2003" year="2003">2003</date>
<idno type="doi">10.1007/978-3-7908-1772-0_1</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-V1BJDMB2-9/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001959</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001959</idno>
<idno type="wicri:Area/Istex/Curation">001428</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B26</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000B26</idno>
<idno type="wicri:doubleKey">1434-9922:2003:Lee K:structure:analysis:and</idno>
<idno type="wicri:Area/Main/Merge">000C28</idno>
<idno type="wicri:Area/Main/Curation">000C11</idno>
<idno type="wicri:Area/Main/Exploration">000C11</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Structure Analysis and Generation for Internet Documents</title>
<author><name sortKey="Lee, Kyong Ho" sort="Lee, Kyong Ho" uniqKey="Lee K" first="Kyong Ho" last="Lee">Kyong Ho Lee</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Institute of Standards and Technology, 20899, Gaithersburg, MD</wicri:regionArea>
<placeName><region type="state">Maryland</region>
</placeName>
</affiliation>
<affiliation></affiliation>
</author>
<author><name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon Chul" last="Choy">Yoon Chul Choy</name>
<affiliation wicri:level="3"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Dept. of Computer Science, Yonsei University, 120-749, Seoul</wicri:regionArea>
<placeName><settlement type="city">Séoul</settlement>
<region type="capital">Région capitale de Séoul</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author><name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
<affiliation wicri:level="3"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Dept. of Computer Science, Yonsei University, 120-749, Seoul</wicri:regionArea>
<placeName><settlement type="city">Séoul</settlement>
<region type="capital">Région capitale de Séoul</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s" type="main" xml:lang="en">Studies in Fuzziness and Soft Computing</title>
<idno type="ISSN">1434-9922</idno>
<idno type="eISSN">1860-0808</idno>
<idno type="ISSN">1434-9922</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1434-9922</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper presents a syntactic method for logical structure analysis and generation for creation of Web documents. The method transforms document images with multiple pages and hierarchical structure into an XML document. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of document class efficiently. Experimental results with 372 images scanned from the technical journal show that the method has performed logical structure analysis successfully. Particularly, the method generates XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.</div>
</front>
</TEI>
<affiliations><list><country><li>Corée du Sud</li>
<li>États-Unis</li>
</country>
<region><li>Maryland</li>
<li>Région capitale de Séoul</li>
</region>
<settlement><li>Séoul</li>
</settlement>
</list>
<tree><country name="États-Unis"><region name="Maryland"><name sortKey="Lee, Kyong Ho" sort="Lee, Kyong Ho" uniqKey="Lee K" first="Kyong Ho" last="Lee">Kyong Ho Lee</name>
</region>
</country>
<country name="Corée du Sud"><region name="Région capitale de Séoul"><name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon Chul" last="Choy">Yoon Chul Choy</name>
</region>
<name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
<name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
<name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon Chul" last="Choy">Yoon Chul Choy</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C11 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000C11 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Informatique |area= SgmlV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:615F8C09F525049E216106C0A028B51B97E3B775 |texte= Structure Analysis and Generation for Internet Documents }}
This area was generated with Dilib version V0.6.33. |